** Data reading and variable selection from raw data
** Polish General Social Survey 2002


** 01. Reading data **

cap log close
clear all
set more off
cd /*insert you work directory here*/
use /*read your data here*/ 


** 02. Consructing year and country variables **
keep if PGSSYEAR==2002

ge year=2002
lab var year "survey year"

ge country=616
lab var country "ISO country code"
//Poland: 616 (see "ISO Country Codes.pdf) 


** 03. ID variables **

ge pid=RECORDID
lab var pid "person id"


** 04. Basic Demographics (Sex and Age/birth year) **

rename Q8 sex 
lab var sex "respondent sex"

ge age=Q9AGE
lab var age "age"

ge birthyr=year-age
lab var birthyr "year of birth"


** 05. Siblings **

ge nsibs=Q12A
lab var nsibs "number of siblings"
lab def nsibs -2 "not asked" 99 "no answer"
lab val nsibs nsibs

ge birthorder=nsibs-Q12B+1 if Q12B!=-2 & Q12B!=99
replace birthorder=99 if Q12B==-2 | Q12B==99
lab var birthorder "respondent birth order"
lab def birthorder 99 "not asked/no answer" 100 "don't know"
lab val birthorder birthorder


** 06. Own education **

//highest year of school completed
rename Q131ED eduy
lab var eduy "respondent's highest year of school completed"


** 07. Parents' education: Father and/or Mother **

//degree
rename Q14B faeduc
lab var faeduc "father: highest level of education completed"

rename Q16B moeduc
lab var moeduc "mother: highest level of education completed"

//year
rename Q14ED faeduy
lab var faeduy "father: highest year of schooling completed"

rename Q16ED moeduy
lab var moeduy "mother: highest year of schooling completed"


** 08. Own occupation **

//ISCO code
rename Q22ISCO occ
lab var occ "respondent occupation in ISCO 88 code"

rename Q22ISC9 occ_9
lab var occ_9 "respondent occupation in ISCO 88 code_in 9 categories"

rename Q22ISC27 occ_27
lab var occ_27 "respondent occupation in ISCO 88 code_in 27 categories"


** 09. Parents' occupation **

//ISCO code
rename Q13ISCO faocc
lab var faocc "father occupation in ISCO 88 code"

rename Q13ISC9 faocc_9
lab var faocc_9 "father occupation in ISCO 88 code_in 9 categories"

rename Q13ISC27 faocc_27
lab var faocc_27 "father occupation in ISCO 88 code_in 27 categories"

rename Q15ISCO moocc
lab var moocc "mother occupation in ISCO 88 code"

rename Q15ISC9 moocc_9
lab var moocc_9 "mother occupation in ISCO 88 code_in 9 categories"

rename Q15ISC27 moocc_27
lab var moocc_27 "mother occupation in ISCO 88 code_in 27 categories"


** 10. Tabulate the Identified Variables **

log using /*insert you work directory here*/, replace text

** Data reading and variable selection from raw data
** Polish General Social Survey 2002

** Sex **
tab sex

** Age, Birth Year **
sum age birthyr, d

** Siblings **
sum nsibs, d

** R's Own Education **
tab1 eduy

** Parental Education **
tab1 faeduc faeduy moeduc moeduy

** R's Own Occupation **
tab1 occ occ_9 occ_27

** Parental Occupation **
tab1 faocc faocc_9 faocc_27 moocc moocc_9 moocc_27

log close

** 11. Keep the identified variables only

keep year country pid sex age birthyr ///
	 nsibs birthorder eduy faeduc faeduy moeduc moeduy ///
	 occ occ_9 occ_27 ///
	 faocc faocc_9 faocc_27 moocc moocc_9 moocc_27
	 

** 12. Save the Data File **

saveold /*insert you work directory here*/, replace



** 13. Homoginising education (by Manting Chen, 2017/4/3)**
** Own Education **
rename eduy educ_yrs
replace educ_yrs=. if educ_yrs==99
//own education local categories not available
ge educ_ISCED=000 if educ_yrs==0
replace educ_ISCED=020 if educ_yrs==4
replace educ_ISCED=100 if educ_yrs==8
replace educ_ISCED=244 if educ_yrs==10
replace educ_ISCED=354 if educ_yrs==12
replace educ_ISCED=354 if educ_yrs==14
replace educ_ISCED=766 if educ_yrs==17
lab var educ_ISCED "respondent's highest education in ISCED 2011 code"


** Parents Education **

ge faeduc_flag=1 

rename faeduc faeduc_cat
rename faeduy faeduc_yrs
replace faeduc_yrs=. if faeduc_yrs==99 | faeduc_yrs==98
rename moeduc maeduc_cat
rename moeduy maeduc_yrs
replace maeduc_yrs=. if maeduc_yrs==99 | maeduc_yrs==98

ge faeduc_ISCED=000 if faeduc_yrs==0
replace faeduc_ISCED=020 if faeduc_yrs==4
replace faeduc_ISCED=100 if faeduc_yrs==8
replace faeduc_ISCED=244 if faeduc_yrs==10
replace faeduc_ISCED=354 if faeduc_yrs==12
replace faeduc_ISCED=354 if faeduc_yrs==14
replace faeduc_ISCED=766 if faeduc_yrs==17
lab var faeduc_ISCED "father's highest education in ISCED 2011 code"

ge maeduc_ISCED=000 if maeduc_yrs==0
replace maeduc_ISCED=020 if maeduc_yrs==4
replace maeduc_ISCED=100 if maeduc_yrs==8
replace maeduc_ISCED=244 if maeduc_yrs==10
replace maeduc_ISCED=354 if maeduc_yrs==12
replace maeduc_ISCED=354 if maeduc_yrs==14
replace maeduc_ISCED=766 if maeduc_yrs==17
lab var maeduc_ISCED "mother's highest education in ISCED 2011 code"


** 14. Homoginising sibling (by Manting Chen, 2017/4/3)**
//cutoff
ge nsibs_flag=99
lab var nsibs_flag "cutoff of total number of siblings"

lab def nsib_flag 99 "no cutoff"
lab val nsibs_flag nsib_flag

//recode missing
replace nsibs=. if nsibs==-2 | nsibs==99

//number of brothers/sisters not available


** 15. Tab Education and Sibling Variables **
tab1 sex age birthyr
tab1 educ_yrs faeduc_cat faeduc_yrs maeduc_cat maeduc_yrs faeduc_flag 
tab1 nsibs nsibs_flag


** 16. Save the Data File **

saveold /*insert you work directory here*/, replace
